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ABSTRACT 

By comparing semi-analytic galaxy catalogues with data from the Sloan Digital Sky 
Survey (SDSS), we show that current galaxy formation models reproduce qualitatively 
the dependence of galaxy clustering and pairwise peculiar velocities on luminosity, but 
some subtle discrepancies with the data still remain. The comparisons are carried out 
by constructing a large set of mock galaxy redshift surveys that have the same se- 
lection function as the SDSS Data Release Four (DR4). The mock surveys are based 
■ on two sets of semi-analytic catalogues presented by Croton et al. and Kang et al. . 

£ — I From the mock catalogues, we measure the redshift-space projected two-point correla- 

. tion function w p (r p ), the power spectrum P(k) , and the pairwise velocity dispersion 

(PVD) in Fourier space <J\2(k) and in configuration space 012 (r p ), for galaxies in dif- 
ferent luminosity intervals. We then compare these theoretical predictions with the 
measurements derived from the SDSS DR4. On large scales and for galaxies brighter 
than L*, both sets of mock catalogues agree well with the data. For fainter galax- 
ies, however, both models predict stronger clustering and higher pairwise velocities 
than observed. We demonstrate that this problem can be resolved if the fraction of 
faint satellite galaxies in massive haloes is reduced by ~ 30% compared to the model 
predictions. A direct look into the model galaxy catalogues reveals that a significant 
fraction (15%) of faint galaxies (—18 < Mo.i r — 51og 10 h < —17) reside in haloes with 
M v i r > 10 13 M©, and this population is predominantly red in colour. These faint red 
galaxies are responsible for the high PVD values of low-luminosity galaxies on small 
scales. 

Key words: galaxies: clustering - galaxies: distances and redshifts - large-scale struc- 
ture of Universe - cosmology: theory - dark matter 



1 INTRODUCTION 

The spatial and velocity distributions of galaxies have long 
served as important probes of the cosmic density field. Stud- 
ies of the two-point correlation function (2PCF) and the 
pairwise velocity dispersion (PVD), reveal how galaxies are 
related to the underlying mass distribution, thus providing 
strong tests fo r theoretical models of structure and galaxy 
formation (e.g. |Peebleslll980l ; iDavis et al.lll985l ). 
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Both the 2PCF and the PVD can be derived from 
redshift surveys of galaxies. The studies based on early 
surveys have established that the correlation function of 
L* galaxies is close to a power law over nearly four or- 
ders of magnitude in amplitude (e.g. Heebiei[l980). It has 
also been known for decades that the m easured correla- 
tion of galaxies changes with luminosity (|Xia et alj 1 19871 : 
I Borner et al.l I 1991 1_ Lovedav et alj 19951 ) and morpholog- 



ical type (e.g. ' iDavis fc Gellerl Il97rj ). By taking advan- 
tage of the large redshift surveys assembled in recent years, 
in particular the two-degree Field Galaxy Redshift Sur- 
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vey (2dFGRS; IColless et all l200lh a nd the Sloan Digital 
Sky Survey (SPSS; lYork et al.l |2000T ). many authors have 
studied the dependence of clus t ering on a variety galaxy 
properties dNorberg et al I l200ll. 120021; IZehavi et al.l 20021; 



Budayari et all 120031; iGoto et all 120031 ; iMadgwick et ail 
2003l ; IZehavi et al.ll2005l ; lLi et al. 2006a). These studies have 
revealed that galaxies with red colours, bulge-dominated 
morphologies and spectral types indicative of old stellar 
populations reside preferentially in dense regions. Further- 
more, luminous (massive) galaxies cluster more strongly 
than less luminous (less massive) galaxies, with the lumi- 
nosity (mass) dependence becoming more pronounced for 
gal axies brigh t er th an L* (the characteristic luminosity of 
the lSchechterl (|l976t ) function). 

Measurements of the PVD have also been carried 
out by many authors, e ither by modell i ng the redshift 
disto r tion of the 2PCF (iDavis fc Peebles! Il983l; iMo et al ' 



19931 ; iFisher et all 1 1994 IZurek et al.l 1 1994 iMarzke et al 



1995; Somerville et al.Nl997l). or by measurin g the redshift- 



space power spectrum l|jing fc Bornerll^OOlal '). The early re- 
sults often varie d significantly from one survey to another 
l|Mo et al.|[l993h . The PVD of galaxies in the local Uni- 
verse was not well established until the work of I Jing et al.l 
(1998) on the Las Companas Redshift Survey. These results 
have now been co nfirmed by ([Zehavi et al ] |2002l . l2005h using 
the SPSS and b ylHawkins et al.l (|2003l ) using the 2dFGRS. 
Ijing fc Bornerl l|2004 l (hereafter JB04) presented the first 
determination of the PVD for galaxies in different luminos- 
ity intervals. This analysis led to the discovery that the PVD 
exhibits a non-monotonic dependence on galaxy luminosity, 
in that the value of eri2 measured at fc = 1 foMpc -1 de- 
creases as a function of increasing luminosity for galaxies 
fainter than L* , but increases again for the most luminous 
galaxies in the sample. Since the PVD is an indicator of the 
depth of the local gravitational potential, this discovery im- 
plies that a significant fraction of faint galaxies are located 
in massive dark matter haloes that host galaxy groups and 
clusters, but that the majority of L* galaxies are located in 
gal actic sca l e haloe s. These results were recently confirmed 
by iLi et al.l (|2006bl ) (hereafter Paper II) using the second 
data release (DR2) of the SDSS. These authors considered 
their results in conjunction with the observed luminosity 
and color depen dences of the two-point correlation function 
(|Li et al.ll2006al ) (hereafter Paper I) and concluded that the 
faint red galaxy population located in rich clusters was likely 
to be responsible for the high PVD values for low-luminosity 
galaxies on small scales. 

A quantitative understanding of the luminosity de- 
pendence of the PVD requires a model linking the prop- 
erties of galaxies to the dark matter haloes in which 
they are found. One approach is to carry out N-body 
plus hydrodynamical simulations (e.g. Katz fc Gunn 1991; 



Cen fc Ostrikeijll993l; [Bryan et all 1 1994 iNavarro fc Whitel 



1994 ICouchman et all Il995l; iThoul fc Weinberg! Il995l; 
Abel et all Il997l: IWeinberg et all Il998l; lYoshikawa et all 



2000; Springel et al.ll200ll ). By numerically solving the grav- 



itational and hydrodynamical equations, galaxy formation 
in an expanding universe can be simulated in a straightfor- 
ward way. However, the limited dynamic range in current 
hydro/ iV-body simulations and the limited understanding 
of important physical processes such as star formation and 
supernova feedback mean that the hydrodynamical simula- 



tions do not in genera l reproduce the observ ed galaxy lu- 
minosity function (e.g. iNagamine et al.ll2004 ). As a result, 
these simulations cannot be used to interpret the PVD. 

Another method is the Halo Occupation Distribution 
(HOD) approach, which aims to provide a statistical descrip- 
tion of how dark ma t ter haloes are populated by galaxies 
(e.g. IJing et al.lll99Sl; iPeacock fc Smith |2000|; ISeliakl |2000|; 



ISheth et all l200ll; iBerlind fc Weinberg! 120021 : iKang et all 
20021 ; ICoorav fc Shetbj|2002h . In a typical HOD model, the 
link between galaxies and dark matter haloes is expressed 
in terms of the halo occupation function P(N\M), which 
gives the probability that a halo of mass M contains N 
galaxies in a given luminosity range. In addition, the HOD 
model must specify the spatial distribution of the galaxies 
within individual haloes. An alternative way of describing 
this link is in t erms of the condi tional luminosity function 
$(L|M) (CLF. lYang et al.|[2003T) . which characterises the 
luminosity distribution of galaxies that reside in a halo of 
mass M. The HOD approach has been used to interpret 
the observed dependence of clustering on properties such 
as luminosity, colour, morphology and spectral type (e.g. 
Yang et al.ll200il2004 Ivan den Bosch et al.l|2003l ; lYan et all 



2001 l2004TZehavi et al.ll2005l; ICooravlbOOrj) 



JB04 used the HOD models of lYang et ail l|2003h to 
construct mock galaxy catalogues from N-body simulations. 
These catalogues were used to compare the predicted PVD 
with the observations. They found that while the model pro- 
vided a successful match to the luminosity function as well as 
the luminosity dependence of the clustering on large scales, 
it was unable to reproduce the non-monotonic luminosity 
dependence of the PVD (see Figs. 8 and 9 o f JB04 and Fig. 
7 of Paper II). Recently. Slosar et al.l (|2006T ) used their own 
HOD models to show that the non-monotonic behaviour can 
be recovered if a sufficient number of the faint galaxies are 
sa tellite galaxie s in h igh mass haloes. More recent st udies 
bv lTinker et""aH (|2006l ) and Ivan den Bosch et all l|2006h also 
support this interpretation. All these studies indicate that 
the luminosity dependence of the PVD can provide a strong 
constraint on theories of galaxy formation. 

A third method is to construct semi-analytical mod- 
els (SAMs) o f galaxy formation (e.g. I White fc Frenkl Il99ll ; 
Lacev fc Silk! Il99ll; iKauffmann et all Il993l, Il997l Il999l ; 



Cole et all Il994 I2000I ; ISomerville fc Primackl Il999h . This 



method incorporates parametrised models to describe the 
physical processes that regulate how stars form in galax- 
ies as a function of cosmic time. The model parameters are 
chosen to reproduce key observational quantities, such as 
the luminosity functions of galaxies in various wavebands, 
the colour-magnitude relation for early-type galaxies, and 
the Tully-Fisher relation for spiral galaxies. Semi-analytic 
models represent a powerful way of predicting the observed 

properties of galaxies. 

Two recent SAMs have been present e d bylKang et alj 

l|2005h (hereafter K05) and ICroton et ail <|2006n (hereafter 
C06). Both models are based on high-resolution iV-body 
simulations and successfully match a variety of observational 
results. The model galaxy catalogues provided by these au- 
thors contain information not only about galaxy distribu- 
tions in phase space, but also about the observed properties 
of individual galaxies (e.g. the absolute magnitudes in the 
five photometric pass-bands of the SDSS). In this paper, we 
use these semi-analytic catalogues to study whether the lu- 
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minosity dependence of the 2PCF and the PVD of galaxies 
in the local Universe can be reproduc ed in these models. 

As discussed in Ijing et ail |l998) , a large set of mock 
samples is essential for this comparison. The mock samples 
can be used to quantify the errors resulting from "cosmic 
variance" effects and and from systematics in the estima- 
tion methods (e.g. the uncertainties in the distribution func- 
tion of peculiar velocities and in the mean infall velocities of 
galaxy pairs). In this paper, we construct our mock galaxy 
catalogues that have the s ame selection effects as the SPS S 
Data Release Four (DR4. 1 Adelman- McCarthy et al.ll2006h . 
To take into account the effect of the cosmic variance, we 
construct 10 mock catalogues for each SAM in which the 
observer is placed at different , randomly chosen positions 
within the simulation box. Using these mock catalogues and 
the same estimation methods used in Papers I and II, we 
measure the redshift-space projected 2PCF w p (r p ), the real- 
space power spectrum P(k) and the PVD 012 (A;) for galaxies 
in different luminosity intervals. 

Observational results from the SDSS have already been 
presented in Papers I and II, using a sample of ~ 200,000 
galaxies drawn from the SDSS Data Release Two (DR2). 
Here we re-compute all the clustering statistics using the 
SDSS DR4 in order to take advantage of the larger number 
of galaxies in the newer release. We also measure 0"i2(r p ), 
the PVD in configuration space, i n order to make co mpar- 
isons with recent H OD models of ISlosar et al.l (|2006l ) and 
iTinker et all (|2006t) . 

In the following sections we describe the observational 
measurements(§2), the procedure for constructing mock cat- 
alogues^), the comparison between models and observa- 
tions^), the mock experiments to bring the models into 
better agreement with the data(§5), and the nature of the 
luminosity dependence of the PVD(§6). In §7, we summarise 
our results, discuss the implications for the models, and sug- 
gest possible improvements both for future observations and 
models. 

Throughout this paper, we assume a cosmological 
model with the density parameter fio = 0.3 and cosmological 
constant Aq = 0.7. In the C06 SAM model, the cosmological 
parameters are slightly different from these adopted values. 
To properly compare this model with the observations, we 
calculate the positions, redshifts, and apparent magnitudes 
for mock galaxies using the C06 cosmological parameters. 
In the analysis of mock galaxy clustering, we use the same 
cosmological parameters as in the analysis of the observa- 
tional clustering. A Hubble constant h = 1, in units of 100 
kms _1 Mpc~ 1 , is assumed throughout this paper when com- 
puting absolute magnitudes. 



2 OBSERVATIONAL MEASUREMENTS 
2.1 Samples 

The SDSS is the most ambitious optical imaging and spec- 
troscopic survey to date. The survey goals are to ob- 
tain photometry of a quarter of the sky and spectra of 
nearly one millio n objects. Imag i ng is obtained in the u , 
q, r, i, z bands l|Fukugita et all 1 19961 ; ISmith et all |2002| ; 
llvezic et al. 20041) with a special purpose drift scan cam- 
era ( Gunn et al.l I199H ) mounted on the SDSS 2.5 meter 



Table 1. Luminosity samples selected from the NYU-VAGC 
Sample dr4. 



Sample 


M .i r 




Number of 
Galaxies 


Range 


Median 


LI 


[-18.0,-17.0) 


-17.59 


7090 


L2 


[-18.5,-17.5) 


-18.09 


11992 


L3 


[-19.0,-18.0) 


-18.59 


20571 


L4 


[-19.5,-18.5) 


-19.11 


38203 


L5 


[-20.0,-19.0) 


-19.58 


66737 


L6 


[-20.5,-19.5) 


-20.05 


98589 


L7 


[-21.0,-20.0) 


-20.52 


121822 


L8 


[-21.5,-20.5) 


-20.95 


113449 


L9 


[-22.0,-21.0) 


-21.38 


70499 


L10 


[-22.5,-21.5) 


-21.80 


27427 


Lll 


[-23.0,-22.0) 


-22.22 


6085 



telescope l)Gunn et al.l 1200(1 ) at Apache Point Observatory . 
The imaging data are photometrically |Hor'e et all l200ll ; 
iTucker et ail 120061 ) and astrometricallv (ll'ior el all 120031 ) 
calibrated, and used to select spectrosco pic targets for the 
main galaxy sa mple (IStrauss e t al. 2002), the luminous red 
gala xy sample (| Eiscnstei n et al. 2001), and the quasar sam- 
ple l|Richards et al.ll2002r i. Spectroscopic fibres are assigned 
to the targets using an efficient tiling algorit hm designed to 
optimise completeness l|Blanton et al. 2003c ). The deta ils of 
the survey strategy can be found in lYork et alj (|2000l ) and 
an overview of the data pipelines a nd products is provide d 
in the Early Data Release paper (|Stoughton et all |2002| ). 
More details on the photometric pipeline can be found in 
iLupton et ail l|200lf) . 

Papers I and II presented the measurements of the 
redshift-space projected 2PCF w p (r p ), the real-space power 
spectrum P(k) and the PVD o\2(k) for different classes of 
galaxies. In those papers, we used the New York University 
Value Added Catalogue (NYU-VAGC) Q, which is a cata- 
log ue of local galaxies (mostly below z « 0.3) constructed 
bv lBlanton et al.l (|2005l ) based on the SDSS DR2. Here we 
use a new version of the NYU-VAGC (Sample dr4) , which is 
based on SDSS DR4, to re-determine these statistics, but as 
a function of luminosity only. Th e NYU-VAGC is described 
in detail in lBlanton et al. I (|2005h . 

From Sample dr4, we construct 11 luminosity subsam- 
ples, as listed in Table [1] We select all objects with 14.5 < 
r < 17.6 that are identified as galaxies in the Main sample 
(note that r-band magnitude has been corrected for galac- 
tic extinction). We also restrict the galaxies to the redshift 
range 0.01 ^ z ^ 0.3, and the absolute magnitude range 
—23 < Mo.i r < —17. Here, Mo.i r is the r-band absolute 
magnitude correct ed to its z = 0.1 v alue using the K- 
correction code of iBlanton et al. (2003a) and the luminos- 
ity evolution model of lBlanton et al.l |2003b). The resulting 
sample includes a total of 292,782 galaxies, which are then 
divided into subsamples according to absolute magnitude. 
Each subsample includes galaxies in an absolute magnitude 
interval of 1 magnitude, with successive subsamples overlap- 
ping by 0.5 magnitude. This sample selection is identical to 
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that in Paper I, except that we have adopted slightly dif- 
ferent apparent and absolute magnitudes limits. We do not 
consider galaxies fainter than Mo.i T = — 17, because the vol- 
ume covered by such faint samples are very small and the 
results are subject to large errors from cosmic variance (see 
for example Fig. 6 of Paper I). The faint apparent magni- 
tude limit of 17.6 is chosen to yield a uniform galaxy sample 
that is complete over the entire area of the survey. 



2.2 Methods 

Our methodology for computing w p (r p ), P(k) and ai2{k) in 
the SDSS has been described in detail in Papers I and II. We 
present below a brief description and the reader is referred 
to the earlier papers for details. 

For each subsample, the redsh if t-spac e 2PCF £ s (r p ,iv) 
is measured using the iHamiltonl l| 19931 ) estimator. The 
redshift-space projected 2PCF w p (r p ) is then estimated by 
integrating £' 3 ' (r p ,iv) along the line-of-sight direction n with 
| -7r | ranging from to 40 /i _1 Mpc. Random samples are con- 
structed in which the redshift selection function is explicitly 
modelled using the observed luminosity function. We have 
also corrected c arefully for the effect of fibre collisions (see 
i Li et al ] |2006d . hereafter Paper III) for a detailed descrip- 
tion). 

From C^( r P! 7r )i we obtain for each subsample the 
redshift-space power spectrum P^ 3 '(k,fj,). We then deter- 
mine simultaneously the power spectrum P(k) and the PVD 
o"i2(fc) by modelling the measured P' s '(fc,/i) using the rela- 
tion 



P W (fc, M ) = P(fc)(l + /^ 2 ) 5 



l + (fc^!2(fc)) 2 



(1) 



Here k is the wavenumber, fj, the cosine of the angle between 
the wavevector and the line of sight, and j3 the linear red- 
shift distortion parameter. In the equation above, the first 
term is the power spectrum, the second term is the Kaiser 
linear compression effect (Kaiser 1987), and the third term 
is the damping effect caused by the random motion of the 
galaxies. In the computation, we have fixed the linear red- 
shift distortion parameter j3 = 0.45. As we have shown in 
Paper II, our o~i2(fc) measurements are robust to reasonable 
changes of the (3 values. 

In addition, we also compute the configuration space 
PVD, (Ji%{r p ), which is estimated by modelling redshift dis- 
tortions in the 2PCF. This method relies on the fact that 
the peculiar motions of galaxies change only their radial dis- 
tances in redshift space. Thus the information for peculiar 
velocities along the line of sight can be recovered by mod- 
elling the redshift-space 2PCF ^ 3 \r p ,n) as a convolution 
of the real-space 2PCF £(r) with the distribution function 
of the pairwise velocity /(W12): 



(s) (r p ,K)= [ /(« 12 )£ (Vr 2 , + (^ - M 2 ) 



dvi 



(2) 



where W12 = Vi2(r p ,n) is the pairwise peculiar velocity. The 
real-space correlation function £(r) is inferred from the pro- 
jected 2PCF w p (r p ), which is a simple Abel transform of 
£(r). An exponential form is adopted for /(V12): 



/(V12) 



\/2i 



■ exp 



CT12 



V2 

\V12 — WIS 

Cl2 



(3) 



where U12 is the mean and a 12 is the dispersion of the one- 
dimensional p eculiar velocities. Assuming the infall model 
for vx2 used bv ljing et al. I l|l998h . the PVD is then estimated 
as a function of the projected separation r p by comparing 
the observed ^ s \r p ,n) with the modelled one. 



2.3 Results 

Using the samples listed in Table [1] and the methods de- 
scribed above, we have derived the the projected 2PCF 
w p (r p ), the power spectrum P(k), the PVD in Fourier space 
012 (fc), and the PVD in configuration space ai2{r p ). The 
results are shown in Figure [1] Panels from left to right cor- 
respond to the six luminosity subsamples (samples LI, L3, 
L5, L7, L9 and Lll in Table [TJ, while panels from top to 
bottom correspond to the different clustering statistics. The 
blue and red lines compare the results obtained from the 
DR4 and DR2. The two data releases agree quite well, ex- 
cept for <7i2(fc) in the brightest luminosity sample, where the 
DR4 measurements on scales k > 0.5 ZiMpc -1 are larger , 
but still within the error bars of the DR2 measurements. 

A comparison of the PVD for the two different esti- 
mation methods is shown in Figure [2] for all 11 luminosity 
samples. The fc-space measurements <Ti2(fc) are plotted in 
black. The PVDs in configuration space o\2(r p ) axe plotted 
in red as a function of l/r p and in blue as a function of 
Tr/r p . We see that, if the relation k = l/r p is used in the 
comparison, 012 (r p ) and o\2(k) agree well within error bars 
both in shape and in amplitude, for galaxies fainter than 
— 19 or brighter than —21. For galaxies around L* , o"i2(r p ) 
is systematically higher than a\2(k), by up to 30 per cent on 
intermediate scales. Using n/r p for k does not improve the 
agreement between the two quantities. It is not surprising 
that there are differences between the results, because the 
PVD er(r) in 3D configuration space is not a constant. Our 
results indicate that it is important that the PVDs from 
the semi-analytic model be computed in exactly the same 
manner as is done in the observations. 



3 MOCK CATALOGUES 

3.1 SAMs and model catalogues 

In this paper, we compare the clustering and velocity statis- 
tics predicted by the semi-analytic models with the obser- 
vations by constructing a large set of mock galaxy samples 
that have the same selection effects as the SDSS DR4. We 
use two sets of semi-analytic catalogues of galaxies at z — 0, 
provided by C06 and K05, to construct our mock catalogues. 

The semi-analytic catalo gues of C06 were co nstructed 
using the Millennium Run ((Springel et alj I2005T ). a very 
large simulation of the concordance ACDM cosmogony with 
10 10 particles. The relevant cosmological parameters are 
the density parameter Q m — 0.25, the cosmological con- 
stant J1a = 0.75, and the amplitude of the power spec- 
trum (78 = 0.9. The chosen simulation volume is a periodic 
box of size Lb ox — 500 /i _1 Mpc on a side, which implies 
a particle mass of 8.6 x 10 s /i _1 M Q . After finding haloes 
and subhaloes at all output snapshots and building merg- 
ing trees that describe how haloes grow as the universe 
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Figure 1. Clustering and velocity statistics for galaxies with various luminosities. Panels from left to right correspond to different 
luminosity intervals, as indicated at the top of the figure. Panels from top to bottom correspond to different statistics: the projected 
2PCF w p (r p ), the real space power spectrum P(k), the PVD measured in Fourier space <Ti2(fc), and the PVD in configuration space 
&l2(rp)- Blue and red lines are measured from the SDSS DR4 and the SDSS DR2 respectively. The dashed lines are plotted for 
reference, which are the same in each row (from top to bottom): the line corresponding to £(r) = (r/5h~ 1 Mpc) — 1,s , the power spectrum 
P(k) = (60/A:) 1 ' 4 , and (in both the bottom two rows) the line for o\i = 500 kms -1 . 



evolves, C06 implemented a model to simulate the forma- 
tion and evolution of galaxies and their central supermas- 
sive black holes. Their model closely matches many obser- 
vations, including the galaxy luminosity function, galaxy 
colour distributions, the Tully-Fisher relation of spirals, 
the colour-magnitude relation of ellipticals, the bulge mass- 
black hole mass relation, and the volume-averaged cosmic 
star formation rate. The models yield a number of use- 
ful quantities that can be directly compared with obser- 
vations at different redshifts. These include positions in 



phase space, total luminosities and bulge luminosities in 
various bands, stellar masses , cold, hot and ejected gas 
mass, black hole mass, and star formation rate. The semi- 
analytic galaxy catalogue used here is publicly available at 
http:/ / www.mpa-garching.mpg.de/galform/agnpaper and 
it includes a total of ~ 9 x 10 galaxies at redshift zero 
in the full simulation box. The catalogue is complete down 
to M r — 51ogh, = —16.6 and to Mb — 51ogh, = —15.6. 

Using the semi-analytical approach, K05 also carried 
out a set of semi-analytic galaxy catalogues by modelling 
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Figure 2. Comparison of PVD as measured in Fourier space and in configuration space for galaxies in different luminosity intervals, 
as indicated in each panel. Black lines plot a 12 as a function of k. The PVDs measured in configuration space are plotted in red as a 
function of l/r p and in blue as a function of 7r/r p . The dashed line in each panel represents the PVD value of 500 krns" 1 . 



galaxy formation in a series of high-resolution iV-body sim- 
ulations. The simulations used in their study h ave been car- 
ried o ut with the vectorised parallel P 3 M code |jing fc Sutd 
2002), considering boxes with periodic boundary conditions 
in a concordance ACDM cosmology. The cosmological pa- 
rameters f2 m = 0.3 and J1a = 0.7 are slightly different from 
those of C06. There are 512 3 particles in the simulation box 
of Lbox = 100/i -1 Mpc (L100 simulation). Although the sim- 
ulation is much smaller than that of C06, the mass resolu- 
tion is comparable. The galaxy formation model has been 
updated to include supermassive black hole format i on an d 
AGN energy feedback, as described in iKang et all l|200fj) . 
The SAM model of K05 can also match many observations, 
e.g. the luminosity functions of galaxies in various wave- 
bands redder than the it-band, the main features in the ob- 
served colour distribution of galaxies, the colour-magnitude 
relation for elliptical galaxies in clusters, the metallicity- 
luminosity relation and metallicity-rotation velocity relation 
of spiral galaxies, and the gas fraction in present-day spiral 
galaxies. 

In order to study the clustering of galaxies on large 
scales, we will use a simulation of 512 3 particles and box size 
300 /i -1 Mpc (L300 simulation) with the same cosmological 



parameters as the smaller box. Because of its poor mass res- 
olution, we do not follow the formation histories of galaxies 
in this simulation. Instead, we combine the L100 simulation 
and a set of resimulations of massive clusters of ~ 10 M© 
(see K05), and use these higher resolution simulations to 
populate the dark matter halos in the L300 simulation. In 
detail, for each halo in the L300 simulation, we select an halo 
from the I/ioo simulation or the cluster resimulations that is 
closest in mass. The galaxies of this matching halo will be 
placed into the Z/300 simulation halo. All physical properties 
as well as the relative position and velocity with respect to 
centre of halo mass are kept the same. 

The Kang et al. and C06 are similar, but there are still 
many differences in the details of the implementation. For 
example, C06 allowed starbursts to be triggered during mi- 
nor mergers. The energy released by gas accretion onto the 
central supermassive black hole is also slightly different in 
the two implementations. The parameters of the star forma- 
tion laws and even of the cosmological models are different. 
These differences make it interesting for us to compare the 
clustering of galaxies in the two SAM implementations. 

Figure [3] shows the 01 r-band luminosity function for 
galaxies in the semi-analytical catalogues, compared to 
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Figure 4. Equatorial distribution of right ascension and redshift for galaxies within 6° of the equator in the SDSS (left) and in our 
mock catalogues. 



: 




Figure 3. Galaxy luminosity function at 01 r-band. Triangles are 
for the L500 SAM catalogue of Croton et al.(2006). Squares and 
filled circles are respectively for the L100 and L300 catalogues of 
Kang et al. (2005). The line surrounded by the magenta band 
plots the observational result presented by Blanton et al. (2003b) 
based on the first data release of the SDSS; the magenta band in- 
dicates its error. Open circles with error bars represent the result 
obtained in this paper with the SDSS DR4. 

the SDSS observations presented in lBlanton et al.l (|2003bh . 
The SAMs reproduce the observed luminosity function rea- 
sonably well , although they still predict too many faint 
(Mo.i r > -19) and bright (Mo.i r < -22) galaxies. It is 
worth noting that the L100 and L300 catalogues contain 
fewer L* galaxies than observed. 

We have also recomputed the observed galaxy luminos- 
ity function using our SDSS DR4 sample. We have corrected 
for the volume incompleteness by weighting each galaxy by 
a factor of V aurve y/V m ax, where V aurvey is the volume of 
the sample and V m ax is the maximum volume over which 
the galaxy could be observed within the redshift range and 
the apparent magnitude range of the sample. To determine 



the Vm ax, we have used the kcorrect code of lBlanton et al.l 
(|2003al ) to compute for each galaxy a z mirl and a 2„„ the 
redshifts at which the galaxy would reach the bright and 
the faint r-band magnitude limits. Our measurement is also 
shown in Figu r e El and it agrees quite well with the result of 
Bla nton et all (|2003bl ). The erro rs are estimated us ing the 
bootstrap resampling technique (|Barrow et al.lll984l ). 

3.2 Constructing mock galaxy redshift surveys 

We aim to construct mock galaxy redshift surveys that have 
the same observational selection effects as the SDSS DR4. 
A detailed account of the observational selection effects ac- 
companies with the NYU-VAGC release. Our methodology 
of constructing mock SDSS catalogues has been described in 
detail in Paper III. First, we create nx nx n replications of 
the simulation box which has periodic boundary conditions, 
and place a virtual observer randomly within the central 
box. Here n is chosen so that the required depth can be 
achieved in all directions for the observer. Next, we define a 
(a,<5)-coordinate frame and remove all galaxies that lie out- 
side the survey region. We then compute for each galaxy 
the redshift as "seen" by the observer, the r-band appar- 
ent magnitude and Mo.i r , the r-band absolute magnitude 
of the galaxy at z = 0.1. Finally, we mimic the position- 
dependent completeness by randomly eliminating galaxies 
using the completeness masks provided in the Sample dr4. 

We produce 10 mock catalogues from each SAM cata- 
logue, from which we then select luminosity sa mples in the 
same way as the real sample. As pointed out by I Yang et al.l 
(2004), the L300 catalogue is only complete down to Mbj ~ 
— 18.4 (i.e. Mo.i r ~ —19.3), while the I/100 catalogue is com- 
plete down to Alb j ~ —14 (i.e. Mo.i r ~ —14.9) because it is 
based on higher-resolution simulation. This implies that the 
mock samples constructed from the L300 catalogue would be 
incomplete out to a distance of ~ 350/i _1 Mpc. To overcome 
this problem, we combined the L100 and I/300 mock samples 
by selecting galaxies with Mo.i r < — 19 from the L100 sam- 
ples and selecting those with Mo.i r > —19 from the L300 
ones. 

In total we have 20 mock catalogues: 10 from the -L500 
catalogue and 10 from the I/100 plus L300 catalogues. Fig- 
ure U shows the equatorial distribution of galaxies in one 
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Figure 5. Comparison of the clustering and velocity statistics as measured from mock catalogues and as observed from SDSS DR4, 
for galaxies with various luminosities. Panels from left to right correspond to different luminosity intervals, as indicated above the top 
panels. Panels from top to bottom correspond to different statistics: w p (r p ), P(k), ai2{k) and o"i2(r p ). Red and green lines represent 
the average measurement from the L500 and the L100 + ^300 mock samples respectively. The error bars indicate the uncertainty due to 
cosmic variance as estimated from 10 different mock catalogues. Blue lines plot the observational results. The dashed lines are the same 
as in Figure [T] 



of the L500 mock catalogues (middle) and in one of the 
L100 +£300 catalogues (right), compared to the real sample 
(left). The numbers of galaxies in our L500 mock catalogues, 
300,000 on average with a dispersion of ~ 7000, are consis- 
tent with the observational sample. In case of L100 + £300, 
however, the numbers are smaller: 250,000 on average with 
a dispersion of ~ 3700. As can be seen from Figure [3] the 
model of K05 predicts fewer L* galaxies than the observa- 
tions. 



4 COMPARISONS BETWEEN MODELS AND 
OBSERVATIONS 

For each mock sample, we measure w p (r p ), P(k) , o"i2(fc) 
and ai2{r p ) using the same method as for the observational 
samples (§H}. Figure shows the results in six luminosity 
intervals, the same as in Figure[T] In each panel, the average 
measurement is plotted in red for the L500 mock samples and 
in green for the L100 + L300 samples. The error bars indicate 
the uncertainty due to cosmic variance as estimated from 10 
different mock samples. For comparison, the observational 
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Figure 6. Clustering and velocity statistics as a function of luminosity on different scales, compared between model predictions and 
observations. Panels from left to right correspond to different projected separations r p or scales k, as indicated above the top panels. 
Panels from top to bottom correspond to different statistics: w p (r p ), P(k), <ri2(fc) and <Ti2(r p ). Blue and green lines are the model 
predictions respectively from the L500 and the L100 + ^300 mock catalogues, while blue lines are for the SDSS DR4 observations. The 
smaller panel below each bigger one plots the ratios of the model prediction to the observation. The PVDs measured at k = 1 /iMpc - 1 
are also compared to the 2dFGRS result (black circles with error bars) presented by JB04. 



measurements (blue lines in Figure [TJ are plotted in this 
figure, also as blue lines. It should be pointed out that, for 
faint galaxies, the error bars on the L100 + £300 curves are 
smaller than that on the L500 curves. This is because the 
faint galaxies are taken from the L100 box, which artificially 
reduces the cosmic variance. 

It is seen that both models match the observations rea- 
sonably well. The agreement is better for the two-point cor- 



relation function than for the PVD, and it is also better 
for more luminous galaxies. For galaxies brighter than -19, 
the models reproduce the w p (r p ) and P(k) measurements on 
scales of r p > 1 /i _1 Mpc or k < 1 /iMpc -1 , but marginally 
overpredict or underpredict the clustering power on smaller 
scales in some cases. For galaxies fainter than -19, both mod- 
els predict stronger clustering on all scales compared to the 
observations. It should be noted that the errors due to cos- 
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Figure 7. The spatial and velocity bias factors, scaled by their values at the characteristic luminosity L* , as a function of the luminosity. 
The bias factors are estimated from the measurements of clustering/PVD at r p = 2.7?t _1 Mpc or k = 0.5 hMpc" 1 , as indicated in each 
panel. The green data points with error bars connected with dashed lines are for the model of Kang et al., and those in red connected 
with dotted lines are for the model of Croton et al.. The results from the SDSS are represented with the blue points with error bars 
connected with the solid lines. The open triangles are the 2dFGRS results from JB04. In the inset of the top panel at the left-hand, the 
squares with error bars are also from JB04 but are obtained using w p (r p ) at r p = 4.89 fe —1 Mpc , and the dotted -dashed line is a fit to 
w p (r p ) measurements at r p = 4.89/i _1 Mpc in the 2dFGRS b/b* = 0.85 + 0.15L/L* given in llNorberg et a.l.|[200lh . For clarity, the error 
bars for the 2dFGRS results (open triangles), which are comparable to that in the top-left panel, are not plotted in the other panels. 



mic variance are large so the disagreement is only marginally 
significant. 

Similar result are found for the PVD. On scales of 
r p > 1 /i _1 Mpc or k < 1 ftMpc" 1 and above Mo.i r = -20, 
both 0"i2(f* p ) and oT2(fc) are well matched by the models 
(especially the model of C06). For L* galaxies (Sample L7 
in Table [TJ, the PVD values predicted by the model of K05 
are larger than those by C06, with the difference becoming 
more significant on small scales. This can be understood be- 
cause K05 adopted in their model a larger value of Q m than 
C06. For faint galaxies, the discrepancy between the models 
and the observation seen in clustering statistics is also seen 
in the PVD. The model predictions are higher than the ob- 
servations, but there are large uncertainties due to cosmic 
variance. 

These results are shown more clearly in Figure [5] 



where we have plotted w p (r p ) and <Ji2(r p ) at r p — 
0.2,1,5,10 h~ 1 Mpc, and P(k) and a 12 (k) at k = 
0.25, 0.5, 1, 4 /iMpc -1 , as a function of absolute magnitude. 
The ratios of the model predictions relative to the SDSS ob- 
servations are also plotted We see that both models match 
the observations for high- luminosity galaxies (Afo.i,. < —19), 
but overpredict the clustering amplitude at low luminosities. 
The model of K05 better reproduces the clustering statistics, 
while the model of C06 more closely matches the PVD. Al- 
though the models predict higher pairwise velocities at faint 
luminosities than seen in the observations, it is still encour- 
aging to see that the non-monotonic dependence of a\2(k) 
on luminosity is recovered by both models. This behaviour 
also exists in configuration space, but is less pronounced 
compared to Fourier sp ace. This is qualitat ively consistent 
with the HOD results of lTinker et~aT] I^OOfl ). 
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In the two top panels of Figure [7] we plot the bias 
of galaxies & as a function of lumin osity; this has been 
done in many prev i ous papers (e.g. jNorberg etHI |2002| ; 
iTegmark et~al]|2004l ; IZehavi et al.ll2005l ; Paper P. Here the 
bias factor is normalised by its value at the characteristic 
luminosity M». In the top left panel, we estimate b us- 
ing the projected 2 PCF w p (r p ) at r p — 2.7 h~ 1 Mpc as in 
IZehavi et alj (J2005) . In the top right panel, we use P(k) 
at k = 0.5 /iMpc -1 to measure b. The prediction of K05 
is in excellent agreement with the observations. C06 pre- 
dicts too strong a bias for faint galaxies. In an analogous 
way, we plot the velocity bias b v at r p = 2.7 Mpc and 
012 (fc) at k = 0.5 /iMpc -1 ( bottom panels). This plot con- 
firms that the non-monotonic behaviour found for eri2(fc) at 
k = 1 /iMpc -1 also exists at other scales (k — 0.5 /iMpc -1 ) 
and in configuration space. Again we find that the model 
of K05 matches the observations very well, while the C06 
model has a steeper luminosity dependence than observed. 
Note that when carrying out these comparisons, we have 
normalised the velocity bias by the value fe* at M*. In fact, 
<Ti2 in the semi-analytic models is ~ 1.3 (K05) and ~ 1.1 
(C06) times higher than in the observations (Fig(5j. One 
possibility is that a a CDM model with a lower value of 
Q^ag would fits the observational results better. However, 
as we will show in the next section, this is not required by 
the data. 

Finally, it is interesting to compare measurements of 
the PVD from the SDSS and from the 2dFGRS, as this will 
indicate to what extent the observational results are still 
affected by variations between different regions of the sky. 
In Figure [7] the bias factors from the 2dfGRS calculated 
by JB04 are plotted as open triangles. From the figure, we 
see that there are small but significant differences with the 
SDSS results. For galaxies brighter than 2L* , the spatial 
bias is smaller in the 2dFGRS than in the SDSS. This is 
surprising because it has been claimed in the literature that 
there is no significant difference between the bias factors in 
the two surveys (e.g. IZehavi et al. I l2005l ; Paper I). We note 
, however, that Ze havi compared her SD SS results with the 
2dFGRS results of INorberg et~ai1 l|200ll) , where the w p (r p ) 
measurements were normalised at r p = 4.89 /i~ 1 Mpc and 
not at 2.7/i _1 Mpc. We have gone back to the 2dFGRS data 
and we have estimated b using w p (r p ) at r p — 4.89 /i -1 Mpc 
and we plot the results in Figure [7] as squares (the inset 
of the top-left panel). For comparison, the fitting function 
of Norberg et al. is plotted as a dotted-dashed line. As can 
be seen, our results calculated at r p — 4.89 /i~ 1 Mpc are 
now perfectly consistent with Norberg et al, and are also in 
agreement with the SDSS results at high luminosities. This 
implies that the clustering properties of galaxies in the two 
surveys have a different dependence not only on luminosity, 
but also on scale. We have also studied the results at a va- 
riety of different scales. For example, when r p — 1 /t -1 Mpc 
or k = 1 /iMpc -1 is used for estimating b values, the two 
surveys are perfectly consistent with each other for galaxies 
brighter than —19, but for fainter galaxies, both the spatial 
and the velocity biases are larger in the 2dFGRS than in the 
SDSS. 

In spite of these complications, we find it encourag- 
ing that the semi-analytic models can reproduce the qual- 
itative shape of the luminosity dependence at magnitudes 
Mo.ir < —19. It is not trivial to achieve this success. Pre- 
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Figure 8. Top: the ratio of the luminosity function from the L500 
model catalogue relative to that from the SDSS DR4. Bottom: the 
fraction of the satellite population in the same model catalogue. 
In both panels, the dashed (solid) line represents the result before 
(after) reducing the satellite fraction (see the text for details). 

vious generations of semi-analytic models could not repro- 
duce the_stTOTig_jncrease^f c luminosi- 
ties l|Kauffmann et al.lll999l ; INorberg et al.ll200lf ). For faint 
galaxies, the clustering statistics are still not entirely reli- 
able, because the surveys are still being affected by cosmic 
variance. Larger samples or better estimation methods are 
needed in order to make further progress. 



5 ON THE DISCREPANCIES AT THE FAINT 
END 

As we have seen in ^ there are some significant discrep- 
ancies between the observed PVD of faint galaxies and the 
result of the semi-analytic models. We have also seen (Fig- 
ure |3J that both the C06 and K05 models overpredict the 
number of galaxies at the faint end of the luminosity func- 
tion (Mo.i r > —20). This is the luminosity regime where the 
disagreement with the PVD data is worst. It is thus inter- 
esting to ask whether reducing the number of faint galaxies 
to provide a better match to the luminosity function would, 
at the same time, also solve the PVD discrepancy. 

To answer this question, we have performed several sim- 
ple mock experiments. In the first experiment, we randomly 
remove a number of faint galaxies with Mo.i r > — 20 from 
the L500 model catalogue so that the resulting catalogue has 
the same 01 r--band luminosity function a s the SDSS observa- 
tions presented in lBlanton et al.l (|2003bl ). When computing 
the luminosity function for the model catalogue, we have 
corrected the r-band absolute magnitude M r of each model 
galaxy to its z = 0.1 value Mo.i r in the same way as de- 
scribed in §3. We construct 10 mock catalogues using using 
this reduced catalogue and we analyse the clustering and 
PVD in the same way as in §4. We find that the results are 
almost the same as presented in §4. 

Since the PVD reflects the action of the local gravita- 
tional field, the discrepancies in the PVD at the faint end 
imply that the models predict too many faint galaxies that 
are located in high mass haloes. As we will see (Figure [T2|) . 
these are mainly satellite systems rather than the central 
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Figure 9. The power spectrum P(k) (top panels) and the fc-space PVD a 12(h) (bottom panels) obtained from the mock samples 
constructed based on the L500 model catalogue without (red) and with (green) the satellite fraction being reduced (see the text for 
details). The SDSS results are plotted in blue for comparison. 
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Figure 10. P(k) and u^(K) as a function of luminosity on different scales, compared between the mock samples constructed based on 
the L500 model catalogue without (red) and with (green) the satellite fraction being reduced (see the text for details). The SDSS results 
are plotted in blue for comparison. The smaller panel below each bigger one plots the ratios of the model prediction to the observation. 
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Figure 11. The spatial and velocity relative bias factor as a function of the luminosity, compared between the mock samples constructed 
based on the T500 model catalogue without (red) and with (green) the satellite fraction being reduced (see the text for details). The 
SDSS results are plotted in blue for comparison. The black dashed line is for the mock samples in which the total number of galaxies 
was reduced (to match the luminosity function) but the satellite fraction kept unchanged. 



galaxies of their own halo. It is thus natural to speculate that 
it is these satellites that are responsible for the very large 
PVD values at low luminosities. We thus repeat the above 
experiment except that we preferentially eliminate satellites. 
Satellite galaxies with Mo.i r > — 20 are randomly removed 
until the luminosity function comes into agreement with the 
observation, or until the fraction of satellite galaxies is re- 
duced by more than 30%. In the latter case, we further re- 
move a number of central galaxies at random so that the 
resulting catalogue has the same luminosity function as the 
observation. 

Figure [5] compares the luminosity function and the 
satellite fraction for the original and the reduced model cat- 
alogues. Figure [5] compares the power spectrum P(k) and 
the Fourier space PVD CT12 (k) for the original (red) and the 
reduced (green) catalogues. The SDSS results are plotted 
in blue for comparison. Figure [TT71 plots P(k) and <ri2{k) at 
k = 0.25, 0.5, 1, 4 /iMpc -1 , as a function of absolute magni- 
tude. In Figure 1111 we plot the results for the spatial and 
velocity bias factors. As can be seen from the three figures, 



both the clustering power and the PVD for faint galaxies 
are reduced substantially and change to be consistent with 
the SDSS results. For comparison, we also plot in Figure [TT1 
(dashed black lines) the bias factors obtained from the first 
experiment in which the number of faint galaxies is reduced 
at random, independent of whether it is a satellite or a cen- 
tral galaxy. As can be seen , there is almost no effect on the 
results. Finally we have also investigated what happens if 
we allow the fraction of satellite galaxies to be reduced by 
up to 50%. The agreement with observations is no longer 
very good; both the clustering amplitude and the peculiar 
velocities are now too small. 



6 ON THE NON-MONOTONIC LUMINOSITY 
DEPENDENCE OF THE PVD 

The non-monotonic luminosity dependence of PVD indi- 
cates that a substantial fraction of faint galaxies must reside 
in high-mass dark matter haloes. In paper II, we discussed 
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Figure 12. Distribution of the virial mass of host dark matter haloes for model galaxies in the reduced L500 catalogue in different 
luminosity intervals, as indicated in each panel. The red solid (blue dashed) lines represent the red (blue) galaxy population, and the 
yellow (green) shaded histogram shows the result for central (satellite) galaxies. The fraction of these populations are indicated in each 
panel. 



how the faint red satellite galaxy population in dense envi- 
ronments, even though small in number, can still dominate 
the PVD on small scales (fe ~ 1 ZiMpc -1 ). Here we use the 
galaxy catalogues from the semi-analytic models to check 
whether this hypothesis is correct. 

We first divide the model galaxies in the reduced L500 
catalogue into different luminosity intervals (we use rest- 
frame magnitudes for this analysis). We then divide each 
luminosity sample into red and blue subsamples using a 
luminosity-dependent colour cut, which is determined using 
the colour-magnitude diagram of galaxies in the £500 cat- 
alogue. The colour distribution is bimodal, so the natural 
place to divide the galaxies into "red" and "blue" subpop- 
ulations is at the minimum between the two peaks in the 
colour distribution. 

Figure[T2]shows the distribution of the virial mass of the 
host dark matter haloes for galaxies in different luminosity 
intervals. Red solid and blue dashed lines are for red and 
blue galaxies. The fraction of red and satellite populations 
are indicated in each panel. We also plot the the results for 
central and satellite galaxies. 

In each luminosity interval, the virial mass of host 
haloes shows a peak at lower masses and a longer tail to 



higher masses. The position of the first peak moves to higher 
masses for more luminous galaxies. When the galaxies are 
divided into central and satellite systems, we see that the 
central galaxies dominate the first peak at low halo mass 
and the satellite galaxies are located in the tail of higher 
mass halos. This result may provide clues to understanding 
the bimodal colour distribution of galaxies. The fraction of 
satellite systems in high mass halos increases with decreas- 
ing galaxy luminosity up to M r ~ —19, and then remains 
constant at around ~ 30 % at fainter magnitudes. It is this 
satellite population that gives rise to a high PVD at the 
faint luminosities. In the models the satellites are mainly 
red and we note that the satellite fractions predicted by the 
models are in good agreement of the fraction of red galaxies 
observed in the SDSS (see Table 1 of Paper I) . 



7 SUMMARY 

In this paper, we have compared the clustering and pairwise 
velocities for galaxies in different luminosity intervals mea- 
sured from Sloan Digital Sky Survey with results from mock 
catalogues constructed using the semi-analytical models of 
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iKang et alj i|2005l ) and ICroton et all l|2006h . We show that 
the models can match a number of key features of the lu- 
minosity dependence of the clustering and the PVD, includ- 
ing the monotonic increase of the clustering amplitude with 
luminosity and the non-monotonic behaviour of the PVD. 
PVD. 

A direct look into the galaxy catalogues supports the 
conclusion that a substantial fraction of faint galaxies must 
reside in high mass dark matter haloes. The luminosity de- 
pendence of the PVD is mostly determined by how galaxies 
of different luminosities are distributed among/inside dark 
matter haloes. All these re s ults a r e consistent with th e re- 
cent studies of ISlosar et alJ i|200rj ), lTinker et all (|2006l ) and 
Ivan den Bosch et alj (|2006l ) which were carried out using 
halo occupation distribution (HOD) models. 

We have also identified a few significant differences be- 
tween the models and the observations. The differences are 
generally at the level of a few tens of percent both in the 
clustering and in velocity statistics. One difference is that 
the PVD predicted by the models is systematically higher 
than the observations. Another difference is that the cluster- 
ing of faint galaxies, especially in the C06 model, is signif- 
icantly stronger than that observed in the SDSS. However, 
we note that cosmic variance effects are still significant at 
faint luminosities because the effective surveyed volume is 
small. Significant differences also still exist between the 2dF- 
GRS and the SDSS clustering measurements. If this overpre- 
diction of the clustering for faint galaxies is confirmed, our 
experiment in §5 shows that the fraction of faint satellite 
galaxies in massive halos will have to be reduced by a factor 
of ~ 30% in order to bring the models i nto better agree- 
ment with the data. The recent study by IWeinmann et al.l 
(2006), which compares the fraction of central and satellite 
galaxies in dark halos between the C06 model and the SDSS, 
has found that the fraction of the faint galaxies is too high 
in massive halos. The strong clustering found here for faint 
galaxies in the model is clearly consistent with their findings. 
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